Exploiting Cluster Analysis for Constructing Multi-dimensional Histograms on Both Static and Evolving Data
نویسندگان
چکیده
Density-based clusterization techniques are investigated as a basis for constructing histograms in multi-dimensional scenarios, where traditional techniques fail in providing effective data synopses. The main idea is that locating dense and sparse regions can be exploited to partition the data into homogeneous buckets, preventing dense and sparse regions from being summarized into the same aggregate data. The use of clustering techniques to support the histogram construction is investigated in the context of either static and dynamic data, where the use of incremental clustering strategies is mandatory due to the inefficiency of performing the clusterization task from scratch at each data update.
منابع مشابه
Constructing Two-Dimensional Multi-Wavelet for Solving Two-Dimensional Fredholm Integral Equations
In this paper, a two-dimensional multi-wavelet is constructed in terms of Chebyshev polynomials. The constructed multi-wavelet is an orthonormal basis for space. By discretizing two-dimensional Fredholm integral equation reduce to a algebraic system. The obtained system is solved by the Galerkin method in the subspace of by using two-dimensional multi-wavelet bases. Because the bases of subs...
متن کاملEfficient Selectivity Estimation by Histogram Construction Based on Subspace Clustering
Modern databases have to cope with multi-dimensional queries. For efficient processing of these queries, query optimization relies on multi-dimensional selectivity estimation techniques. These techniques in turn typically rely on histograms. A core challenge of histogram construction is the detection of regions with a density higher than the ones of their surroundings. In this paper, we show th...
متن کاملImage retrieval using color histograms generated by Gauss mixture vector quantization
Image retrieval based on color histograms requires quantization of a color space. Uniform scalar quantization of each color channel is a popular method for the reduction of histogram dimensionality. With this method, however, no spatial information among pixels is considered in constructing the histograms. Vector quantization (VQ) provides a simple and effective means for exploiting spatial inf...
متن کاملMethods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
متن کاملCombining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation
This paper aims to improve the accuracy of query result-size estimations in query optimizers by leveraging the dynamic feedback obtained from observations on the executed query workload. To this end, an approximate \synopsis" of data-value distributions is devised that combines histograms with parametric curve tting, leading to a speci c class of linear splines. The approach reconciles the bene...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006